Research Report

Draft Genome Sequence of Bacillus thuringiensis Strain S3076-1  

Wu Zhongqi1,2 , Zhou Yan2,5 , Liu Panpan1,2 , Wei Yanjun6 , Zhang Yan6 , Li Youzhi5 , Liu Shenkui1 , Fang Xuanjun1,2,3,4,5
1 Alkali Soil Natural Environmental Source Center, Northeast Forestry University, Harbin, 150040
2 Hainan Institute of Tropical Agricultural Resources, Sanya, 572025
3 Institute of Life Science, Jiyang College of Zhejiang A&F University, Zhuji, 311800
4 Cuixi Academy of Biotechnology, Zhuji, 311800
5 College of Life Sciences and Technology, Guangxi University, Nanning, 530005
6 College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150040
Author    Correspondence author
Bt Research, 2016, Vol. 7, No. 1   doi: 10.5376/bt.2016.07.0001
Received: 16 Nov., 2016    Accepted: 20 Nov., 2016    Published: 22 Nov., 2016
© 2016 BioPublisher Publishing Platform
This is an open access article published under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Preferred citation for this article:

Wu et al., 2016, Draft genome sequence of Bacillus thuringiensis strain S3076-1, Bt Research, Vol.7, No.1 1-7 (doi: 10.5376/bt.2016.07.0001)

Abstract

Abstract Bacillus thuringiensis S3076-1 can produce some parasporal crystal with a variety of forms, which may have insecticidal activity to Lepidoptera insects. In this study, the de novo whole genome sequence of Bt S3076-1 was carried out by employing Illumina HiSeq 2000 and generated 1.42 Gb raw data. Through the original reads data quality control, filtering and SOAPdenovo short sequence assembly to get a total of 231 scaffolds, the genome size was 6.42 Mb, N50 length was 97325 bp, GC content of 34.81%. The genomic information of 14 Bt strains in NCBI database were selected as the reference, and the genomic sequences of Scaffolds were assembled to obtain the 3 replicators. Among them, the size of the genome was 5.72 Mb, the content of GC was about 35.00%, and the length of the genomic sequences of the four plasmids were 181 kb and 518 kb, and the GC contents were 33.10% and 33.13%, respectively. Based on the genome sequence and annotation information, the genome of the visual sketch was created by GCviewer server. The whole genome sequencing and construction of Bt S3076-1 provide the basis for the functional study of the strain, and contribute to the identification of Bt S3076-1 insecticidal toxin protein.

Keywords
Bacillus thuringiensis; Bt S3076-1; Whole genome sequencing; Genome assembling; Draft genome

Introduction

Bacillus thuringiensis (Bt) is a kind of aerobic gram positive bacteria, and widely exist in various kinds of environments, such as soil, water, dead insects, mammals and human tissue necrosis (Höfte and Whiteley, 1989; Roh et al., 2007; Raymond et al., 2010). Bt bacteria can produce parasporal crystal protein with insecticidal activity in the late growth stage (Ibrahim et al., 2010). At present, it has been found that ICPs has insecticidal activity against Lepidoptera, Coleoptera, Diptera and Hymenoptera, protozoa and nematodes (de Maagd et al., 2001). Bt is a kind of strong specificity, high efficiency, environmental friendly biocontrol microorganisms, it has been widely used in the prevention and control of pests, biological control of agricultural pests, the cultivation of transgenic insect resistant crops and other fields, and have achieved good economic benefit and environmental benefit.

 

Bt strain S3076-1 was a wild type strain which was isolated from Hainan Diaoluoshan National Nature Reserve by Hainan province Tropical Agricultural Resources Research Institute (HITAR). The result of scanning electron microscopy showed that the strain produced a lot of parasporal crystal during its late growth stage, the main shape were spherical and square. The results of SDS-PAGE electrophoresis showed that the molecular weight of the parasporal crystal protein were 140 kD, 90 kD, 70 kD, 50 kD, 45 kD and 25 kD. We predicted that Bt strain S3076-1 may have insecticidal activity against Lepidopteran insects according to the shape and quantity of parasporal crystal. But the types of insecticidal protein containing the strain was not clear, therefore,We commissioned the Chinese National Human Genome Center at Shanghai (CHGC) to carry out the whole genome sequencing of strain Bt strain S3076-1 by Illumina HiSeq2000 sequencing platform.

 

As of December 2016, the genome assembly and annotation records of 89 Bt strains have been collected in the NCBI database, including the whole genome data (36), the data of the chromosomes (16), the scaffold data (16) and the contig data (16) (https://www.ncbi.nlm.nih.gov/genome/genomes/486?). Among the published genomic data of the Bt strain, the whole genome data was relatively small, and most of the data were in the form of contigs or scaffolds. With the development of the whole genome sequencing technology and the reduction of sequencing cost, more and more research group to carry out the research of Bt strain genome sequencing. They were able to obtain higher quality genomic data by deep sequencing to study the Bt individuals and populations from the genome level. In this study, Bt strain S3076-1 Illumina HiSeq 2000 whole genome sequencing was completed by CHGC, and then reads raw data were assembled to get the scaffolds data. We constructed the genome map of Bt strain S3076-1 based on the whole genome scaffolds data by the same bacteria sequence similarity between different strains and local optimal search algorithm of the biological information analysis method. The results of this study will be helpful to carry out the analysis of the genomic function of Bt strain S3076-1, which has important reference value for the excavation of the new Bt toxin protein.

 

1 Results and Analysis

1.1 Sequencing data statistics and Contigs/Scaffolds sequence assembly

The result of Bt strain S3076-1 de novo whole genome sequencing showed that a total of 1.42 Gb original data contained 3790009 reads data. Through the quality control, filtering and SOAPdenovo2 short sequence assembly processing of the reads data, we got 231 scaffolds with a total length of 6424195 bp, N50 value was 97325 bp, GC content was 34.81% (Table 1)

 

 

Table 1 The statistic of assembled scaffolds of Bt S3076-1

 

1.2 Selection of reference genome

At present, there were a total of 89 Bt strains with genomic information have registered in the NCBI database, which has the complete genome information of 36 strains, 16 strains have the chromosome genome information, and 37 strains have the scaffolds/contigs information. There were 53 Bt strains with the whole genome information and the chromosome genome information were selected as the candidate reference genome. And the sequence similarity analysis was carried out on the platform of ncbi-blast-2.2.27 sequence alignment. We comprehensively analyzed the similarity results of genome sequence alignment from the four aspects of the sequence consistency percentage, the length of the matched sequence, the mismatch number of sequence alignment and the base number of the vacancy ratio. Then 97-27, BMB171, CT-43, HD73, IS5056, YBT-020, str., Al_Hakam, BGSC_4AW1, BGSC_4BD1, BGSC_4CC1, BGSC_4Y1, T03a001, T01001, T04001, Bt407 a total of 15 Bt strains of the genome were selected as Bt strain S3076-1 genome sequence assembly reference genome (Table 2). The genome of the 15 Bt strains were selected as reference for the assembly of the Bt strain S3076-1 genome by using the DIYA genome auto splicing tool in the splicing subroutine diya-assemble_pseudocontig.pl (Figure 1).

 

 

Table 2 Genomic information of reference strains

 

 

Figure 1 DIYA Genome assemble

Note: Red labeled strain, serological markers; Bule labeled strain, not serological markers; Speckle labeled strain, Genome data 

 

1.3 Construction of Bt strain S3076-1 draft genome

The genome of 15 Bt strains were used as reference for the Bt strain S3076-1 genome assembly, and finally got a nucleoid and 2 plasmid genome. The size of the genome was about 5.72 Mb, and the content of GC was 35%. The lengths of the genomic sequences of the two plasmids were 181 kb and 518 kb, and the content of GC were 33.10% and 33.13%, respectively (Table 3). The open reading frames (ORFs) and coding sequences (CDSs) of Bt strain S3076-1 were predicted by Glimmer and GeneMarker software, and a total of 7700 ORF sequences and 6421 CDS sequences were obtained. The circular visual genomes of Bt strain S3076-1 was constructed by the CGView Server (Figure 2) (Grant and Stothard, 2008).

 

 

Table 3 Genomic information of Bt S3076-1

 

 

Figure 2 Genome visualization graphical map of Bt strain S3076-1

Note: A: Chromosome genome of Bt strain S3076-1; B: Plasmid 1 genome of Bt S3076-1; C: Plasmid 2 genome of Bt strain S3076-1

 

2 Discussion

Usually, the proportion of low quality data of whole genome sequencing data was 10%~20%. In this study, we received 1.42 Gb raw data of Bt strain S3076-1 by whole genome sequencing, the low quality data about 2.1% of the original data. The filtering results showed that the reads data quality of Bt genome sequencing results was relatively high, and eventually 97.9% of reads data were assembled into a size of 6.42 Mb Bt strain S3076-1 genome.

 

The genome assembly quality is affected by many factors, the internal factors including genomic complexity, GC content and plasmid, etc.; and external factors such as the sequencing library quality, sequencing random, exogenous pollution and so on. N50 statistic defines assembly quality. Given a set of contigs, each with its own length, the N50 length is defined as the shortest sequence length at 50% of the genome. It can be thought of as the point of half of the mass of the distribution. The larger the N50 value, the more complete the genome sequence, on the other hand, the sequence assembly may relatively fragmented, and gap may be more. In this study, the N50 length of scaffolds was 97 kb, which was lower than the bacterial genome N50 reference standard of 300 kb, and indicated that the assembly quality may not be very high.

 

A total of 15 Bt strains genome information were selected as Bt strain S3076-1 genome sequence assembly reference genome by using bioinformatics software, and completed the genome assembly of Bt strain S3076-1. The encoding gene prediction and the protein function annotation were completed based on the predicted genome, the genome and construction of Bt strain S3076-1. According to Bt genomic data which have been published in the NCBI database, we found that Bt strain genome size of 5.29~6.87 Mb, GC content in the range of 34.50%~35.62%, and the number of plasmid in the range of 1~14. In this study, we found that the Bt strain S3076-1 carry 3 replicons by draft genome. One was the nuclear genome of 5.72 Mb, and other 2 plasmids of 181 kb and 518 kb. According to the size and GC content (34.81%) of the Bt strain S3076-1 genome, the assembly and draft genome construction of the Bt strain S3076-1 genome has a high accuracy. The visualization results of circular genomes by CGview server showed that the genomic characteristics of GC preference of Bt strain S3076-1 was also consistent with the general features of the Bt genome.

 

This result further illustrated the rationality of the construction of the Bt strain S3076-1 draft genome. In the next study, each plasmid contained in Bt strain S3076-1 can be isolated, and further illustrated the correctness of the construction of the Bt strain S3076-1 draft genome based on the molecular level. Or further increase the sequencing depth to fill the lack of gap, and improve the sequencing data to obtain more perfect results.

 

The completion of Bt strain S3076-1 whole genome sequencing and the draft genome provides the basis for functional studies of the strain. The encoding gene prediction and the protein function annotation analysis of Bt strain S3076-1 were helpful to identify the toxin protein of Bt strain S3076-1, and promote the application development process of Bt strain S3076-1.

 

3 Materials and Methods

3.1 Sample collection and genome sequencing of Bt strain S3580-1

Bt strain S3076-1 was a wild type strain which was isolated from Hainan Diaoluoshan National Nature Reserve by Hainan province Tropical Agricultural Resources Research Institute (HITAR). The result of scanning electron microscopy showed that the strain produced a lot of parasporal crystal during its late growth stage, the main shape were spherical and square. We commissioned the Chinese National Human Genome Center at Shanghai (CHGC) to carry out the whole genome sequencing of strain Bt strain S3076-1 by Illumina HiSeq2000 sequencing platform in 2014.

 

3.2 Pretreatment of sequencing data

The quality of the original data was evaluated and pretreated by FastQC V0.6.1 (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/). Filtration of the data mainly include removing the readsmm which the number of base that reads quality continuous ≤ 20 arrive a certain extent (default 40%); removing the reads which the total number of N-containing bases to a certain percentage (default 10%); removing the adapter contamination (default adapter sequence has a 15 bp overlap with read sequence); removing duplication. In the follow-up analysis, the clean data was the focus of our research.

 

3.3 Assembly of contigs and scaffolds

The clean reads data were assembled by using the SOAPdenovo2 short sequence assembly software (http://soap.genomics.org.cn/soapdenovo.html; r240), and the best assembly results was got when the main parameters K set to 46 after the multiple adjustment. The clean reads were marked to contigs sequence, and the assembly results were locally assembled and optimized according to the relationship between the paired-end of clean reads and overlap. The the assembled results of scaffolds was formed finally.

 

3.4 Selection of reference genome

The genomic sequence data of 35 Bt strains with complete genomic information and 15 Bt strains with chromosomes genome information were download from NCBI public database (Benson et al., 1990), and the matching database of chromosome sequence and plasmid sequence were constructed by using the BLAST subroutine makeblastdb, respectively (Altschul et al., 1990). The genome sequence similarity of each strain was analyzed by using the ncbi-blast-2.2.27 sequence alignment platform to investigate the similarity between Bt strain S3076-1 genome and each strain genome. In the end, about 15 Bt strains genome sequence such as 97-27 and Al_Hakam were selected as the reference genome in this study.

 

3.5 Construction of Bt strain S3076-1 draft genome

The clean reads were marked to scaffold sequence according to the relationship between the paired-end of clean reads and scaffolds. If there was a potential Paired-End relationship between the two scaffolds, it was presumed that there may be an overlapping relationship between the two scaffolds, and then judging the assembly ability of the sequencing based on the predicted results. The 15 Bt strains genome sequence information were selected as genome sequence assembly reference genome, and the assembled scaffolds were further classified to form the pseudo genome of Bt strain S3076-1. Then, the coding proteins prediction and functional annotation of the Bt strain S3076-1 genome were performed, and CGView Server (http://stothard.afns.ualberta.ca/cgview_server/) was used to visualize the genome of Bt strain S3076-1 (Grant and Stothard, 2008).

 

Authors’ contributions

Wu Zhongqi was responsible for sequencing Bt genome data analysis and manuscript writing and modifying; Zhou Yan was responsible for strain culture, sample sequencing and preliminary data processing; Liu Panpan and Wei Yanjun were responsible for directing the analysis of Bt genome data; Zhang Yan, Li Youzhi were participates the experiment design and manuscript revision; Liu Shenkui and Fang Xuanjun were the corresponding author, responsible for the guidance of manuscript writing , revised and finalized. All authors have read and approved the final manuscript.

 

Acknowledgement

This Project was initiated and funded by the “China Bt resources collection and identification project (BtSRI)” of Hainan Institute of tropical agricultural resources, and all the intellectual property rights were owned by the Hainan Institute of Tropical Agricultural Resources.

 

Reference

Altschul S.F., Gish W., Miller W., Myers E.W., and Lipman D.J., 1990, Basic local alignment search tool, Journal of Molecular Biology, 215(3): 403-410

https://doi.org/10.1016/S0022-2836(05)80360-2

 

Benson D., Boguski M., Lipman D.J., and Ostell J., 1990, The National Center for Biotechnology Information, Genomics, 6(2): 389-391

https://doi.org/10.1016/0888-7543(90)90583-G

 

de Maagd R.A., Bravo A., and Crickmore N., 2001, How Bacillus thuringiensis has evolved specific toxins to colonize the insect world, Trends in Genetics Tig, 17(4): 193-199

https://doi.org/10.1016/S0168-9525(01)02237-5

 

Höfte, H., and Whiteley H.R., 1989, Insecticidal Crystal Proteins of Bacillus thuringiensis, Microbiological Reviews, 53(2): 242-255

 

Ibrahim M.A., Griko N., Junker M., and Bulla L.A., 2010, Bacillus thuringiensis: a genomics and proteomics perspective, Bioengineered Bugs, 1(1): 31-50

https://doi.org/10.4161/bbug.1.1.10519

 

Raymond B., Johnston P.R., Nielsen-LeRoux C., Lereclus D., and Crickmore N., 2010, Bacillus thuringiensis: An impotent pathogen? Trends in Microbiology, 18(5): 189-194

https://doi.org/10.1016/j.tim.2010.02.006

 

Roh, J.Y., Choi, J.Y., Li M.S., Jin B.R., and Je Y.H., 2007, Bacillus thuringiensis as a specific, safe, and effective tool for insect pest control, Journal of Microbiology & Biotechnology, 17(4): 547-559

 

Grant J.R., and Stothard P., 2008, The CGView Server: a comparative genomics tool for circular genomes, Nucleic Acids Research, 36(Web Server issue): W181-184

Bt Research
• Volume 7
View Options
. PDF(0KB)
. HTML
Associated material
. Readers' comments
Other articles by authors
. Wu Zhongqi
. Zhou Yan
. Liu Panpan
. Wei Yanjun
. Zhang Yan
. Li Youzhi
. Liu Shenkui
. Fang Xuanjun
Related articles
. Bacillus thuringiensis
. Bt S3076-1
. Whole genome sequencing
. Genome assembling
. Draft genome
Tools
. Email to a friend
. Post a comment

503 Service Unavailable

Service Unavailable

The server is temporarily unable to service your request due to maintenance downtime or capacity problems. Please try again later.

Additionally, a 503 Service Unavailable error was encountered while trying to use an ErrorDocument to handle the request.